This R Markdown script analyses behavioural data from the FAB (face attention bias) paradigm of the EMBA project. The data was preprocessed before being read into this script.
The task is modeled after Jakobsen et al. (2021), Attention, Perception, & Psychophysics and the authors were happy to share their stimuli. Each trial starts with a black fixation cross on a white background. Then, a cue consisting of a pair of pictures, one object and one face, is shown with one picture on the left and one on the right of the previous location of the fixation cross. In line with Moore et al. (2012), J Autism Dev Disord, we set the duration of the cue presentation to 200ms. Afterwards, a target square appears either at the previous location of the face or the object. Subjects task is to determine the location (right or left) of the target as fast and accurate as possible. The target only disappears when the participant gives their answer.
# number of simulations
nsim = 250
# set number of iterations and warmup for models
iter = 3000
warm = 1000
# set the seed
set.seed(2468)
The following packages are used in this RMarkdown file:
## [1] "R version 4.4.1 (2024-06-14)"
## [1] "knitr version 1.46"
## [1] "ggplot2 version 3.5.1"
## [1] "brms version 2.21.0"
## [1] "designr version 0.1.13"
## [1] "bridgesampling version 1.1.2"
## [1] "tidyverse version 2.0.0"
## [1] "ggpubr version 0.6.0"
## [1] "ggrain version 0.0.4"
## [1] "bayesplot version 1.11.1"
## [1] "SBC version 0.3.0.9000"
## [1] "rstatix version 0.7.2"
## [1] "BayesFactor version 0.9.12.4.7"
## [1] "bayestestR version 0.13.2"
We planned to determine the group-level effect subjects following Barr (2013). For each model, experiment specific priors were set based on previous literature or the task (see comments in the code).
We perform prior predictive checks as proposed in Schad, Betancourt and Vasishth (2020) using the SBC package. To do so, we create 250 simulated datasets where parameters are simulated from the priors. These parameters are used to create one fake dataset. Both the true underlying parameters and the simulated discrimination values are saved.
Then, we create graphs showing the prior predictive distribution of the simulated discrimination threshold to check whether our priors fit our general expectations about the data. Next, we perform checks of computational faithfulness and model sensitivity as proposed by Schad, Betancourt and Vasishth (2020) and implemented in the SBC package. We create models for each of the simulated datasets. Last, we calculate performance metrics for each of these models, focusing on the population-level parameters.
First, we load the data and combine it with demographic information including the diagnostic status of the subjects. Then, all predictors are set to sum contrasts. We have a look at the demographics describing our three diagnostic groups: adults with ADHD, autistic adults and adults without any neurological and psychiatric diagnoses.
Since this is sensitive data, we load the anonymised version of the processed data at this point but also leave the code we used to create it.
# check if the data file exists, if yes load it:
if (!file.exists("FAB_data.RData")) {
# get demo info for subjects
df.sub = read_csv(file.path("/home/emba/Documents/EMBA/CentraXX", "EMBA_centraXX.csv"), show_col_types = F) %>%
mutate(
diagnosis = recode(diagnosis, "CTR" = "COMP")
)
# set the data path
dt.path = "/home/emba/Documents/EMBA/BVET"
# load excluded participants (low accuracy)
exc = scan(file.path(dt.path, 'FAB_exc.txt'), what="character", sep=NULL)
df.exc = df.sub %>% filter(subID %in% exc) %>% select(diagnosis) %>% group_by(diagnosis) %>% count()
# load the behavioral data and merge with group
df.fab = merge(df.sub %>% select(subID, diagnosis),
readRDS(file = paste0(dt.path, '/df_FAB.RDS')), all.y = T) %>%
mutate_if(is.character, as.factor)
# only keep participants included in the study in the subject data frame
df.sub = merge(df.fab %>% select(subID) %>% distinct(), df.sub, all.x = T)
# load the eye tracking data and only keep participants included in the study
load(file.path(dt.path, "FAB_ET_data.RData"))
df.sac = merge(df.sac, df.sub %>% select(subID, diagnosis), keep.y = T)
# anonymise the data
df.fab = df.fab %>%
mutate(
PID = subID,
subID = as.factor(as.numeric(subID))
)
# get a correspondence of original PIDs and anonymised subIDs
df.recode = df.fab %>% select(PID, subID) %>% distinct()
recode = as.character(df.recode$subID)
names(recode) = df.recode$PID
df.fab = df.fab %>% select(-PID)
# anonymise ET data in the same way
df.sac$subID = str_replace_all(df.sac$subID, recode)
# print gender frequencies and compare them across groups
tb.gen = xtabs(~ gender + diagnosis, data = df.sub)
ct.full = contingencyTableBF(tb.gen, sampleType = "indepMulti", fixedMargin = "cols")
# since only DAN in the ADHD group, we try again after excluding them
ct.mf = contingencyTableBF(tb.gen[2:3,], sampleType = "indepMulti", fixedMargin = "cols")
# check which outcomes of interest are normally distributed
df.sht = df.sub %>%
group_by(diagnosis) %>%
shapiro_test(age, iq, BDI_total, ASRS_total, RAADS_total, TAS_total) %>%
mutate(
sig = if_else(p < 0.05, "*", "")
)
# most of the measures are not normally distributed, therefore, we compute ranks for these outcomes
df.sub = df.sub %>%
mutate(
rage = rank(age),
rBDI = rank(BDI_total),
rRAADS = rank(RAADS_total),
rTAS = rank(TAS_total),
diagnosis = as.factor(diagnosis)
)
# now we can compute our ANOVAs
aov.age = anovaBF(rage ~ diagnosis, data = df.sub)
aov.iq = anovaBF(iq ~ diagnosis, data = df.sub)
aov.BDI = anovaBF(rBDI ~ diagnosis, data = df.sub)
aov.ASRS = anovaBF(ASRS_total ~ diagnosis, data = df.sub)
aov.RAADS = anovaBF(rRAADS ~ diagnosis, data = df.sub)
aov.TAS = anovaBF(rTAS ~ diagnosis, data = df.sub)
# ...and put everything in a new dataframe for printing
measurement = "Age"
ADHD = sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "ADHD",]$age), sd(df.sub[df.sub$diagnosis == "ADHD",]$age)/sqrt(nrow(df.sub[df.sub$diagnosis == "ADHD",])))
ASD = sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "ASD",]$age), sd(df.sub[df.sub$diagnosis == "ASD",]$age)/sqrt(nrow(df.sub[df.sub$diagnosis == "ASD",])))
COMP = sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "COMP",]$age), sd(df.sub[df.sub$diagnosis == "COMP",]$age)/sqrt(nrow(df.sub[df.sub$diagnosis == "COMP",])))
logBF10 = sprintf("%.3f", aov.age@bayesFactor[["bf"]])
df.table = data.frame(measurement, ADHD, ASD, COMP, logBF10)
df.table = rbind(df.table,
c(
"ASRS",
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "ADHD",]$ASRS_total), sd(df.sub[df.sub$diagnosis == "ADHD",]$ASRS_total)/sqrt(nrow(df.sub[df.sub$diagnosis == "ADHD",]))),
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "ASD",]$ASRS_total), sd(df.sub[df.sub$diagnosis == "ASD",]$ASRS_total)/sqrt(nrow(df.sub[df.sub$diagnosis == "ASD",]))),
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "COMP",]$ASRS_total), sd(df.sub[df.sub$diagnosis == "COMP",]$ASRS_total)/sqrt(nrow(df.sub[df.sub$diagnosis == "COMP",]))),
sprintf("%.3f", aov.ASRS@bayesFactor[["bf"]])
),
c(
"BDI",
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "ADHD",]$BDI_total), sd(df.sub[df.sub$diagnosis == "ADHD",]$BDI_total)/sqrt(nrow(df.sub[df.sub$diagnosis == "ADHD",]))),
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "ASD",]$BDI_total), sd(df.sub[df.sub$diagnosis == "ASD",]$BDI_total)/sqrt(nrow(df.sub[df.sub$diagnosis == "ASD",]))),
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "COMP",]$BDI_total), sd(df.sub[df.sub$diagnosis == "COMP",]$BDI_total)/sqrt(nrow(df.sub[df.sub$diagnosis == "COMP",]))),
sprintf("%.3f", aov.BDI@bayesFactor[["bf"]])
),
c(
"Gender (diverse/agender/non-binary - female - male)",
sprintf("%d - %d - %d", nrow(df.sub[df.sub$diagnosis == "ADHD" & df.sub$gender == "dan",]), nrow(df.sub[df.sub$diagnosis == "ADHD" & df.sub$gender == "fem",]), nrow(df.sub[df.sub$diagnosis == "ADHD" & df.sub$gender == "mal",])),
sprintf("%d - %d - %d", nrow(df.sub[df.sub$diagnosis == "ASD" & df.sub$gender == "dan",]), nrow(df.sub[df.sub$diagnosis == "ASD" & df.sub$gender == "fem",]), nrow(df.sub[df.sub$diagnosis == "ASD" & df.sub$gender == "mal",])),
sprintf("%d - %d - %d", nrow(df.sub[df.sub$diagnosis == "COMP" & df.sub$gender == "dan",]), nrow(df.sub[df.sub$diagnosis == "COMP" & df.sub$gender == "fem",]), nrow(df.sub[df.sub$diagnosis == "COMP" & df.sub$gender == "mal",])),
sprintf("%.3f", ct.full@bayesFactor[["bf"]])
),
c(
"IQ",
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "ADHD",]$iq), sd(df.sub[df.sub$diagnosis == "ADHD",]$iq)/sqrt(nrow(df.sub[df.sub$diagnosis == "ADHD",]))),
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "ASD",]$iq), sd(df.sub[df.sub$diagnosis == "ASD",]$iq)/sqrt(nrow(df.sub[df.sub$diagnosis == "ASD",]))),
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "COMP",]$iq), sd(df.sub[df.sub$diagnosis == "COMP",]$iq)/sqrt(nrow(df.sub[df.sub$diagnosis == "COMP",]))),
sprintf("%.3f", aov.iq@bayesFactor[["bf"]])
),
c(
"RAADS",
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "ADHD",]$RAADS_total), sd(df.sub[df.sub$diagnosis == "ADHD",]$RAADS_total)/sqrt(nrow(df.sub[df.sub$diagnosis == "ADHD",]))),
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "ASD",]$RAADS_total), sd(df.sub[df.sub$diagnosis == "ASD",]$RAADS_total)/sqrt(nrow(df.sub[df.sub$diagnosis == "ASD",]))),
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "COMP",]$RAADS_total), sd(df.sub[df.sub$diagnosis == "COMP",]$RAADS_total)/sqrt(nrow(df.sub[df.sub$diagnosis == "COMP",]))),
sprintf("%.3f", aov.RAADS@bayesFactor[["bf"]])
),
c(
"TAS",
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "ADHD",]$TAS_total), sd(df.sub[df.sub$diagnosis == "ADHD",]$TAS_total)/sqrt(nrow(df.sub[df.sub$diagnosis == "ADHD",]))),
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "ASD",]$TAS_total), sd(df.sub[df.sub$diagnosis == "ASD",]$TAS_total)/sqrt(nrow(df.sub[df.sub$diagnosis == "ASD",]))),
sprintf("%.2f (±%.2f)", mean(df.sub[df.sub$diagnosis == "COMP",]$TAS_total), sd(df.sub[df.sub$diagnosis == "COMP",]$TAS_total)/sqrt(nrow(df.sub[df.sub$diagnosis == "COMP",]))),
sprintf("%.3f", aov.TAS@bayesFactor[["bf"]])
)
)
# save it all
save(df.fab, df.sac, df.table, df.sht, ct.full, ct.mf, df.exc, file = "FAB_data.RData")
} else {
load("FAB_data.RData")
}
# print the group of excluded participants
kable(df.exc)
| diagnosis | n |
|---|---|
| ADHD | 1 |
| ASD | 1 |
rm(df.exc)
# print the outcome of the shapiro tests
kable(df.sht)
| diagnosis | variable | statistic | p | sig |
|---|---|---|---|---|
| ADHD | ASRS_total | 0.9304718 | 0.1119616 | |
| ADHD | BDI_total | 0.8170622 | 0.0007279 | * |
| ADHD | RAADS_total | 0.9257590 | 0.0885842 | |
| ADHD | TAS_total | 0.9593558 | 0.4504806 | |
| ADHD | age | 0.9180376 | 0.0604839 | |
| ADHD | iq | 0.9648751 | 0.5684099 | |
| ASD | ASRS_total | 0.9512379 | 0.2881380 | |
| ASD | BDI_total | 0.8078769 | 0.0003974 | * |
| ASD | RAADS_total | 0.9449707 | 0.2103029 | |
| ASD | TAS_total | 0.9181140 | 0.0530706 | |
| ASD | age | 0.9492659 | 0.2611939 | |
| ASD | iq | 0.9579062 | 0.3978177 | |
| COMP | ASRS_total | 0.9192136 | 0.0561186 | |
| COMP | BDI_total | 0.7421016 | 0.0000383 | * |
| COMP | RAADS_total | 0.8449398 | 0.0017635 | * |
| COMP | TAS_total | 0.8727860 | 0.0059679 | * |
| COMP | age | 0.8104190 | 0.0004382 | * |
| COMP | iq | 0.9505974 | 0.2791247 |
rm(df.sht)
# print the outcome of the two contingency tables for comparison
ct.full@bayesFactor
## bf error time code
## Non-indep. (a=1) -3.748042 0 Wed Sep 11 10:12:21 2024 294e54f5e70e
ct.mf@bayesFactor
## bf error time code
## Non-indep. (a=1) -1.915524 0 Wed Sep 11 10:12:21 2024 294ee08e9ff
# set and print the contrasts
contrasts(df.fab$cue) = contr.sum(2)
contrasts(df.fab$cue)
## [,1]
## face 1
## object -1
contrasts(df.fab$diagnosis) = contr.sum(3)
contrasts(df.fab$diagnosis)
## [,1] [,2]
## ADHD 1 0
## ASD 0 1
## COMP -1 -1
# print final group comparisons for the paper
kable(df.table)
| measurement | ADHD | ASD | COMP | logBF10 |
|---|---|---|---|---|
| Age | 26.70 (±1.51) | 28.58 (±1.46) | 27.42 (±1.15) | -1.774 |
| ASRS | 43.39 (±2.61) | 32.54 (±1.62) | 25.04 (±1.68) | 12.024 |
| BDI | 8.09 (±1.77) | 11.21 (±2.12) | 2.25 (±0.62) | 8.319 |
| Gender (diverse/agender/non-binary - female - male) | 2 - 9 - 12 | 0 - 13 - 11 | 0 - 11 - 13 | -3.748 |
| IQ | 108.35 (±2.46) | 111.31 (±2.95) | 109.90 (±1.92) | -1.859 |
| RAADS | 92.61 (±8.74) | 152.92 (±8.32) | 44.58 (±7.15) | 23.564 |
| TAS | 50.22 (±2.58) | 63.17 (±1.48) | 39.08 (±1.93) | 19.247 |
The three diagnostic groups are similar in age, IQ and gender distribution. However, they seem to differ in their questionnaire scores measuring ADHD (ASRS), depression (BDI), autism (RAADS) and alexithymia (TAS).
First, we analyse the reaction times for all correctly answered trials to assess whether participants answer faster if the target appears at the previous location of the face, which we refer to as face attention bias (FAB). In our preregistration, we formulated the following hypotheses:
code = "FAB"
# full model formula
f.fab = brms::bf(rt.cor ~ diagnosis * cue + (cue | subID) + (cue * diagnosis | stm) )
# set informed priors based on previous results
priors = c(
# general priors based on SBV
prior(normal(6, 0.3), class = Intercept),
prior(normal(0, 0.5), class = sigma),
prior(normal(0, 0.1), class = sd),
prior(lkj(2), class = cor),
# face attention bias effect based on Jakobsen et al. (2021)
prior(normal(-0.01, 0.04), class = b, coef = cue1),
# ADHD subjects being slower based on Pievsky & McGrath (2018)
prior(normal(0.025, 0.04), class = b, coef = diagnosis1),
# ASD subjects being slower based on Morrison et al. (2018)
prior(normal(0.025, 0.04), class = b, coef = diagnosis2),
# decreased FAB in ASD subjects based on Moore et al. (2012)
prior(normal(0.01, 0.04), class = b, coef = diagnosis2:cue1),
# no specific expectations for FAB in ADHD
prior(normal(0, 0.04), class = b),
# shift
prior(normal(200, 100), class = ndt)
)
# check if the SBC already exists
if (file.exists(file.path(cache_dir, sprintf("df_res_%s.rds", code)))) {
# load in the results of the SBC
df.results = readRDS(file.path(cache_dir, sprintf("df_res_%s.rds", code)))
df.backend = readRDS(file.path(cache_dir, sprintf("df_div_%s.rds", code)))
dat = readRDS(file.path(cache_dir, sprintf("dat_%s.rds", code)))
} else {
# perform the SBC
gen = SBC_generator_brms(f.fab, data = df.fab, prior = priors, family = shifted_lognormal,
thin = 50, warmup = 20000, refresh = 2000)
bck = SBC_backend_brms_from_generator(gen, chains = 4, thin = 1,
init = 0.1, warmup = warm, iter = iter)
dat = generate_datasets(gen, nsim)
saveRDS(dat, file.path(cache_dir, sprintf("dat_%s.rds", code)))
res = compute_SBC(dat,
bck,
cache_mode = "results",
cache_location = file.path(cache_dir, sprintf("res_%s", code)))
df.results = res$stats
df.backend = res$backend_diagnostics
saveRDS(df.results, file = file.path(cache_dir, paste0("df_res_", code, ".rds")))
saveRDS(df.backend, file = file.path(cache_dir, paste0("df_div_", code, ".rds")))
}
We start by investigating the rhats and the number of divergent samples. This shows that 6 of 250 simulations had at least one parameter that had an rhat of at least 1.05, and 1 models had divergent samples (mean number of samples of the simulations with divergent samples: 4). This suggests that this model performs well.
Next, we can plot the simulated values to perform prior predictive checks.
# create a matrix out of generated data
dvname = gsub(" ", "", gsub("[\\|~].*", "", f.fab)[1])
dvfakemat = matrix(NA, nrow(dat[['generated']][[1]]), length(dat[['generated']]))
for (i in 1:length(dat[['generated']])) {
dvfakemat[,i] = dat[['generated']][[i]][[dvname]]
}
truePars = dat$variables
# set large values to a max
dvfakemat[dvfakemat > 2000] = 2000
# compute one histogram per simulated data-set
binwidth = 20
breaks = seq(0, max(dvfakemat, na.rm=T) + binwidth, binwidth)
histmat = matrix(NA, ncol = length(dat), nrow = length(breaks)-1)
for (i in 1:nrow(truePars)) {
histmat[,i] = hist(dvfakemat[,i], breaks = breaks, plot = F)$counts
}
# for each bin, compute quantiles across histograms
probs = seq(0.1, 0.9, 0.1)
quantmat= as.data.frame(matrix(NA, nrow=dim(histmat)[1], ncol = length(probs)))
names(quantmat) = paste0("p", probs)
for (i in 1:dim(histmat)[1]) {
quantmat[i,] = quantile(histmat[i,], p = probs)
}
quantmat$x = breaks[2:length(breaks)] - binwidth/2 # add bin mean
p1 = ggplot(data = quantmat, aes(x = x)) +
geom_ribbon(aes(ymax = p0.9, ymin = p0.1), fill = c_light) +
geom_ribbon(aes(ymax = p0.8, ymin = p0.2), fill = c_light_highlight) +
geom_ribbon(aes(ymax = p0.7, ymin = p0.3), fill = c_mid) +
geom_ribbon(aes(ymax = p0.6, ymin = p0.4), fill = c_mid_highlight) +
geom_line(aes(y = p0.5), colour = c_dark, linewidth = 1) +
labs(title = "Distribution of simulated discriminations", y = "", x = "") +
theme_bw()
tmpM = apply(dvfakemat, 2, mean) # mean
tmpSD = apply(dvfakemat, 2, sd)
p2 = ggplot() +
stat_bin(aes(x = tmpM), fill = c_dark) +
labs(x = "Mean RTs (ms)", title = "Means of simulated RTs") +
theme_bw()
p3 = ggplot() +
stat_bin(aes(x = tmpSD), fill = c_dark) +
labs(x = "SD RTs (ms)", title = "SDs of simulated RTs") +
theme_bw()
p = ggarrange(p1,
ggarrange(p2, p3, ncol = 2, labels = c("B", "C")),
nrow = 2, labels = "A")
annotate_figure(p, top = text_grob("Prior predictive checks: reaction times", face = "bold", size = 14))
Subfigure A shows the distribution of the simulated data with bluer bands being more likely than greyer bands. It shows a distribution that fits our expectations about reaction times in a simple decision task. The same applies to the distribution of the means and standard deviations in the simulated datasets. We go ahead with these priors and check the results of the SBC. We only plot the results from the models that had no divergence issues.
# get simulation numbers with issues
rank = max(df.results$max_rank)
check = merge(df.results %>%
group_by(sim_id) %>% summarise(rhat = max(rhat, na.rm = T), mean_rank = mean(max_rank)) %>%
filter(rhat >= 1.05 | mean_rank != rank),
df.backend %>% filter(n_divergent > 0), all = T)
# plot SBC with functions from the SBC package focusing on population-level parameters
a = 0.5
df.results.b = df.results %>%
filter(substr(variable, 1, 2) == "b_") %>%
filter(!(sim_id %in% check$sim_id))
p1 = plot_ecdf_diff(df.results.b) + theme_bw() + theme(legend.position = "none") +
scale_x_continuous(breaks=scales::pretty_breaks(n = 3)) +
scale_y_continuous(breaks=scales::pretty_breaks(n = 3))
p2 = plot_rank_hist(df.results.b, bins = 20) + theme_bw() +
scale_x_continuous(breaks=scales::pretty_breaks(n = 3)) +
scale_y_continuous(breaks=scales::pretty_breaks(n = 3))
p3 = plot_sim_estimated(df.results.b, alpha = 0.5) + theme_bw() +
scale_x_continuous(breaks=scales::pretty_breaks(n = 3)) +
scale_y_continuous(breaks=scales::pretty_breaks(n = 3))
p4 = plot_contraction(df.results.b, prior_sd = setNames(c(0.15, rep(0.10, length(unique(df.results.b$variable))-1)), unique(df.results.b$variable))) +
theme_bw() +
scale_x_continuous(breaks=scales::pretty_breaks(n = 3)) +
scale_y_continuous(breaks=scales::pretty_breaks(n = 3))
p = ggarrange(p1, p2, p3, p4, labels = "AUTO", ncol = 1, nrow = 4)
annotate_figure(p, top = text_grob("Computational faithfulness and model sensitivity", face = "bold", size = 14))
Next, we check the ranks of the parameters. If the model is unbiased, these should be uniformly distributed (Schad, Betancourt and Vasishth, 2020). The sample empirical cumulative distribution function (ECDF) lies within the theoretical distribution (95%) and the rank histogram also shows ranks within the 95% expected range, although there are some small deviations. We judge this to be acceptable.
Then, we investigated the relationship between the simulated true parameters and the posterior estimates. Although there are individual values diverging from the expected pattern, most parameters were recovered successfully within an uncertainty interval of alpha = 0.05.
Last, we explore the z-score and the posterior contraction of our population-level predictors. The z-score “determines the distance of the posterior mean from the true simulating parameter”, while the posterior contraction “estimates how much prior uncertainty is reduced in the posterior estimation” (Schad, Betancourt and Vasisth, 2020). All of this looks good for this model.
As the next step, we fit the model to the data, check whether there are divergence or rhat issues, and then check whether the chains have converged.
# fit the full model
set.seed(2469)
m.fab = brm(f.fab,
df.fab, prior = priors,
family = shifted_lognormal,
iter = iter, warmup = warm,
backend = "cmdstanr", threads = threading(8),
file = "m_fab_full"
)
rstan::check_hmc_diagnostics(m.fab$fit)
##
## Divergences:
## 0 of 8000 iterations ended with a divergence.
##
## Tree depth:
## 0 of 8000 iterations saturated the maximum tree depth of 10.
##
## Energy:
## E-BFMI indicated no pathological behavior.
# check that rhats are below 1.01
sum(brms::rhat(m.fab) >= 1.01, na.rm = T)
## [1] 1
# check the trace plots
post.draws = as_draws_df(m.fab)
mcmc_trace(post.draws, regex_pars = "^b_",
facet_args = list(ncol = 3)) +
scale_x_continuous(breaks=scales::pretty_breaks(n = 3)) +
scale_y_continuous(breaks=scales::pretty_breaks(n = 3))
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
This model has no pathological behaviour with E-BFMI, no divergent samples and only one rhat that is higher or equal to 1.01. Therefore, we go ahead and perform our posterior predictive checks.
# get posterior predictions
post.pred = posterior_predict(m.fab, ndraws = nsim)
# check the fit of the predicted data compared to the real data
p1 = pp_check(m.fab, ndraws = nsim) +
theme_bw() + theme(legend.position = "none") + xlim(0, 2000)
# get rid of NAs in data frame for plotting
df.fab.na = df.fab[!is.na(df.fab$rt.cor),]
# distributions of means compared to the real values per group
p2 = ppc_stat_grouped(df.fab.na$rt.cor, post.pred, df.fab.na$diagnosis) +
theme_bw() + theme(legend.position = "none")
# distributions of means compared to the real values per cue
p3 = ppc_stat_grouped(df.fab.na$rt.cor, post.pred, df.fab.na$cue) +
theme_bw() + theme(legend.position = "none")
# distributions of means compared to the real values per group for faces
p4 = ppc_stat_grouped(df.fab.na[df.fab.na$cue == "face",]$rt.cor, post.pred[,which(df.fab.na$cue == "face")], df.fab.na[df.fab.na$cue == "face",]$diagnosis) +
theme_bw() + theme(legend.position = "none")
# distributions of means compared to the real values per group for objects
p5 = ppc_stat_grouped(df.fab.na[df.fab.na$cue == "face",]$rt.cor, post.pred[,which(df.fab.na$cue == "face")], df.fab.na[df.fab.na$cue == "face",]$diagnosis) +
theme_bw() + theme(legend.position = "none")
p = ggarrange(p1, p2, p3, p4, p5,
nrow = 5, ncol = 1, labels = "AUTO")
annotate_figure(p, top = text_grob("Posterior predictive checks: RTs", face = "bold", size = 14))
Although the overall shape in subfigure A of the simulated data fits well with the real data, the model seems to underestimate the reaction times of the ADHD group and overestimate the reaction times of the COMP group. This is also apparent when looking at the two cue conditions separately.
Since we are interested in accurate estimates, we decide to aggregate with the median of the reaction times per stimulus (face object combination) and cue. Then, there are no missing values in the data and we model an estimate for each specific stimulus and cue combination for each participant.
First, we compute the aggregation and have a quick look at the resulting data.
# aggregate reaction times
df.fab = df.fab %>%
group_by(subID, diagnosis, stm, cue) %>%
summarise(
rt.cor = median(rt.cor, na.rm = T)
) %>% ungroup() %>%
mutate_if(is.character, as.factor)
# set and print the contrasts
contrasts(df.fab$cue) = contr.sum(2)
contrasts(df.fab$cue)
## [,1]
## face 1
## object -1
contrasts(df.fab$diagnosis) = contr.sum(3)
contrasts(df.fab$diagnosis)
## [,1] [,2]
## ADHD 1 0
## ASD 0 1
## COMP -1 -1
summary(df.fab)
## subID diagnosis stm cue rt.cor
## 1 : 72 ADHD:1656 1_10 : 142 face :2556 Min. :256.0
## 2 : 72 ASD :1728 1_11 : 142 object:2556 1st Qu.:361.5
## 3 : 72 COMP:1728 1_12 : 142 Median :395.2
## 4 : 72 1_7 : 142 Mean :407.4
## 5 : 72 1_8 : 142 3rd Qu.:437.5
## 6 : 72 1_9 : 142 Max. :919.0
## (Other):4680 (Other):4260
There are now no NAs in the data, because no one made an error on all instances of one stimulus combination.
We again perform an SBC. The model formula and priors can stay the same.
code = "FAB_agg"
# check if the SBC already exists
if (file.exists(file.path(cache_dir, sprintf("df_res_%s.rds", code)))) {
# load in the results of the SBC
df.results = readRDS(file.path(cache_dir, sprintf("df_res_%s.rds", code)))
df.backend = readRDS(file.path(cache_dir, sprintf("df_div_%s.rds", code)))
dat = readRDS(file.path(cache_dir, sprintf("dat_%s.rds", code)))
} else {
# perform the SBC
gen = SBC_generator_brms(f.fab, data = df.fab, prior = priors, family = shifted_lognormal,
thin = 50, warmup = 20000, refresh = 2000)
bck = SBC_backend_brms_from_generator(gen, chains = 4, thin = 1,
init = 0.1, warmup = warm, iter = iter)
set.seed(468)
if (file.exists(file.path(cache_dir, sprintf("dat_%s.rds", code)))) {
dat = readRDS(file.path(cache_dir, sprintf("dat_%s.rds", code)))
} else {
dat = generate_datasets(gen, nsim)
saveRDS(dat, file.path(cache_dir, sprintf("dat_%s.rds", code)))
}
res = compute_SBC(dat,
bck,
cache_mode = "results",
cache_location = file.path(cache_dir, sprintf("res_%s", code)))
df.results = res$stats
df.backend = res$backend_diagnostics
saveRDS(df.results, file = file.path(cache_dir, paste0("df_res_", code, ".rds")))
saveRDS(df.backend, file = file.path(cache_dir, paste0("df_div_", code, ".rds")))
}
set.seed(4682)
We start by investigating the rhats and the number of divergent samples. This shows that 3 of 250 simulations had at least one parameter that had an rhat of at least 1.05, and 2 models had divergent samples (mean number of samples of the simulations with divergent samples: 17.5). This suggests that this model performs well.
Next, we can plot the simulated values to perform prior predictive checks.
# create a matrix out of generated data
dvname = gsub(" ", "", gsub("[\\|~].*", "", f.fab)[1])
dvfakemat = matrix(NA, nrow(dat[['generated']][[1]]), length(dat[['generated']]))
for (i in 1:length(dat[['generated']])) {
dvfakemat[,i] = dat[['generated']][[i]][[dvname]]
}
truePars = dat$variables
# set large values to a max
dvfakemat[dvfakemat > 2000] = 2000
# compute one histogram per simulated data-set
binwidth = 20
breaks = seq(0, max(dvfakemat, na.rm=T) + binwidth, binwidth)
histmat = matrix(NA, ncol = length(dat), nrow = length(breaks)-1)
for (i in 1:nrow(truePars)) {
histmat[,i] = hist(dvfakemat[,i], breaks = breaks, plot = F)$counts
}
# for each bin, compute quantiles across histograms
probs = seq(0.1, 0.9, 0.1)
quantmat= as.data.frame(matrix(NA, nrow=dim(histmat)[1], ncol = length(probs)))
names(quantmat) = paste0("p", probs)
for (i in 1:dim(histmat)[1]) {
quantmat[i,] = quantile(histmat[i,], p = probs)
}
quantmat$x = breaks[2:length(breaks)] - binwidth/2 # add bin mean
p1 = ggplot(data = quantmat, aes(x = x)) +
geom_ribbon(aes(ymax = p0.9, ymin = p0.1), fill = c_light) +
geom_ribbon(aes(ymax = p0.8, ymin = p0.2), fill = c_light_highlight) +
geom_ribbon(aes(ymax = p0.7, ymin = p0.3), fill = c_mid) +
geom_ribbon(aes(ymax = p0.6, ymin = p0.4), fill = c_mid_highlight) +
geom_line(aes(y = p0.5), colour = c_dark, linewidth = 1) +
labs(title = "Distribution of simulated discriminations", y = "", x = "") +
theme_bw()
tmpM = apply(dvfakemat, 2, mean) # mean
tmpSD = apply(dvfakemat, 2, sd)
p2 = ggplot() +
stat_bin(aes(x = tmpM), fill = c_dark) +
labs(x = "Mean RTs (ms)", title = "Means of simulated RTs") +
theme_bw()
p3 = ggplot() +
stat_bin(aes(x = tmpSD), fill = c_dark) +
labs(x = "SD RTs (ms)", title = "SDs of simulated RTs") +
theme_bw()
p = ggarrange(p1,
ggarrange(p2, p3, ncol = 2, labels = c("B", "C")),
nrow = 2, labels = "A")
annotate_figure(p, top = text_grob("Prior predictive checks: reaction times", face = "bold", size = 14))
Again, this all looks good.
# get simulation numbers with issues
rank = max(df.results$max_rank)
check = merge(df.results %>%
group_by(sim_id) %>% summarise(rhat = max(rhat, na.rm = T), mean_rank = mean(max_rank)) %>%
filter(rhat >= 1.05 | mean_rank != rank),
df.backend %>% filter(n_divergent > 0), all = T)
# plot SBC with functions from the SBC package focusing on population-level parameters
a = 0.5
df.results.b = df.results %>%
filter(substr(variable, 1, 2) == "b_") %>%
filter(!(sim_id %in% check$sim_id))
p1 = plot_ecdf_diff(df.results.b) + theme_bw() + theme(legend.position = "none") +
scale_x_continuous(breaks=scales::pretty_breaks(n = 3)) +
scale_y_continuous(breaks=scales::pretty_breaks(n = 3))
p2 = plot_rank_hist(df.results.b, bins = 20) + theme_bw() +
scale_x_continuous(breaks=scales::pretty_breaks(n = 3)) +
scale_y_continuous(breaks=scales::pretty_breaks(n = 3))
p3 = plot_sim_estimated(df.results.b, alpha = 0.5) + theme_bw() +
scale_x_continuous(breaks=scales::pretty_breaks(n = 3)) +
scale_y_continuous(breaks=scales::pretty_breaks(n = 3))
p4 = plot_contraction(df.results.b, prior_sd = setNames(c(0.15, rep(0.10, length(unique(df.results.b$variable))-1)), unique(df.results.b$variable))) +
theme_bw() +
scale_x_continuous(breaks=scales::pretty_breaks(n = 3)) +
scale_y_continuous(breaks=scales::pretty_breaks(n = 3))
p = ggarrange(p1, p2, p3, p4, labels = "AUTO", ncol = 1, nrow = 4)
annotate_figure(p, top = text_grob("Computational faithfulness and model sensitivity", face = "bold", size = 14))
Next, we check the ranks of the parameters. If the model is unbiased, these should be uniformly distributed (Schad, Betancourt and Vasishth, 2020). The sample empirical cumulative distribution function (ECDF) lies within the theoretical distribution (95%) and the rank histogram also shows ranks within the 95% expected range, although there are some small deviations. We judge this to be acceptable.
Then, we investigated the relationship between the simulated true parameters and the posterior estimates. Although there are individual values diverging from the expected pattern, most parameters were recovered successfully within an uncertainty interval of alpha = 0.05.
Last, we explore the z-score and the posterior contraction of our population-level predictors. The z-score “determines the distance of the posterior mean from the true simulating parameter”, while the posterior contraction “estimates how much prior uncertainty is reduced in the posterior estimation” (Schad, Betancourt and Vasisth, 2020). All of this looks good for this model.
As the next step, we fit the model to the data, check whether there are divergence or rhat issues, and then check whether the chains have converged.
# fit the final model
set.seed(6824)
m.fab = brm(f.fab,
df.fab, prior = priors,
family = shifted_lognormal,
iter = iter, warmup = warm,
backend = "cmdstanr", threads = threading(8),
file = "m_fab_final"
)
rstan::check_hmc_diagnostics(m.fab$fit)
##
## Divergences:
## 0 of 8000 iterations ended with a divergence.
##
## Tree depth:
## 0 of 8000 iterations saturated the maximum tree depth of 10.
##
## Energy:
## E-BFMI indicated no pathological behavior.
# check that rhats are below 1.01
sum(brms::rhat(m.fab) >= 1.01, na.rm = T)
## [1] 1
# check the trace plots
post.draws = as_draws_df(m.fab)
mcmc_trace(post.draws, regex_pars = "^b_",
facet_args = list(ncol = 3)) +
scale_x_continuous(breaks=scales::pretty_breaks(n = 3)) +
scale_y_continuous(breaks=scales::pretty_breaks(n = 3))
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
This model has no pathological behaviour with E-BFMI, no divergent samples and only one rhat that is higher or equal to 1.01. Therefore, we go ahead and perform our posterior predictive checks.
# get posterior predictions
post.pred = posterior_predict(m.fab, ndraws = nsim)
# check the fit of the predicted data compared to the real data
p1 = pp_check(m.fab, ndraws = nsim) +
theme_bw() + theme(legend.position = "none")
# distributions of means compared to the real values per group
p2 = ppc_stat_grouped(df.fab$rt.cor, post.pred, df.fab$diagnosis) +
theme_bw() + theme(legend.position = "none")
# distributions of means compared to the real values per cue
p3 = ppc_stat_grouped(df.fab$rt.cor, post.pred, df.fab$cue) +
theme_bw() + theme(legend.position = "none")
# distributions of means compared to the real values per group for faces
p4 = ppc_stat_grouped(df.fab[df.fab$cue == "face",]$rt.cor, post.pred[,which(df.fab$cue == "face")], df.fab[df.fab$cue == "face",]$diagnosis) +
theme_bw() + theme(legend.position = "none")
# distributions of means compared to the real values per group for objects
p5 = ppc_stat_grouped(df.fab[df.fab$cue == "face",]$rt.cor, post.pred[,which(df.fab$cue == "face")], df.fab[df.fab$cue == "face",]$diagnosis) +
theme_bw() + theme(legend.position = "none")
p = ggarrange(p1, p2, p3, p4, p5,
nrow = 5, ncol = 1, labels = "AUTO")
annotate_figure(p, top = text_grob("Posterior predictive checks: RTs", face = "bold", size = 14))
This model fits our data much better, so we continue with it and can finally have a look at the estimates.
Now that we are convinced that we can trust our model, we have a look at its estimate and use the hypothesis function to assess our hypotheses and perform explorative tests.
# print a summary
summary(m.fab)
## Family: shifted_lognormal
## Links: mu = identity; sigma = identity; ndt = identity
## Formula: rt.cor ~ diagnosis * cue + (cue | subID) + (cue * diagnosis | stm)
## Data: df.fab (Number of observations: 5112)
## Draws: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~stm (Number of levels: 36)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept) 0.02 0.00 0.02 0.03 1.00
## sd(cue1) 0.03 0.00 0.03 0.04 1.00
## sd(diagnosis1) 0.01 0.00 0.00 0.01 1.00
## sd(diagnosis2) 0.00 0.00 0.00 0.01 1.00
## sd(cue1:diagnosis1) 0.00 0.00 0.00 0.01 1.00
## sd(cue1:diagnosis2) 0.00 0.00 0.00 0.01 1.00
## cor(Intercept,cue1) -0.34 0.16 -0.62 -0.01 1.00
## cor(Intercept,diagnosis1) -0.06 0.29 -0.62 0.52 1.00
## cor(cue1,diagnosis1) 0.01 0.29 -0.55 0.58 1.00
## cor(Intercept,diagnosis2) -0.05 0.32 -0.64 0.57 1.00
## cor(cue1,diagnosis2) -0.06 0.32 -0.64 0.58 1.00
## cor(diagnosis1,diagnosis2) -0.03 0.34 -0.66 0.63 1.00
## cor(Intercept,cue1:diagnosis1) 0.04 0.31 -0.57 0.63 1.00
## cor(cue1,cue1:diagnosis1) -0.06 0.31 -0.64 0.55 1.00
## cor(diagnosis1,cue1:diagnosis1) 0.00 0.33 -0.63 0.61 1.00
## cor(diagnosis2,cue1:diagnosis1) 0.01 0.33 -0.62 0.64 1.00
## cor(Intercept,cue1:diagnosis2) -0.22 0.31 -0.74 0.45 1.00
## cor(cue1,cue1:diagnosis2) -0.04 0.30 -0.60 0.55 1.00
## cor(diagnosis1,cue1:diagnosis2) 0.03 0.33 -0.61 0.65 1.00
## cor(diagnosis2,cue1:diagnosis2) 0.05 0.33 -0.59 0.67 1.00
## cor(cue1:diagnosis1,cue1:diagnosis2) -0.11 0.34 -0.72 0.57 1.00
## Bulk_ESS Tail_ESS
## sd(Intercept) 2811 4767
## sd(cue1) 2444 4161
## sd(diagnosis1) 2429 3649
## sd(diagnosis2) 3966 3845
## sd(cue1:diagnosis1) 3663 4576
## sd(cue1:diagnosis2) 3600 3473
## cor(Intercept,cue1) 1405 2806
## cor(Intercept,diagnosis1) 10080 5818
## cor(cue1,diagnosis1) 12563 5682
## cor(Intercept,diagnosis2) 16393 5907
## cor(cue1,diagnosis2) 14307 5894
## cor(diagnosis1,diagnosis2) 8237 6494
## cor(Intercept,cue1:diagnosis1) 14956 6192
## cor(cue1,cue1:diagnosis1) 12163 5932
## cor(diagnosis1,cue1:diagnosis1) 8877 5643
## cor(diagnosis2,cue1:diagnosis1) 6504 6362
## cor(Intercept,cue1:diagnosis2) 9857 6113
## cor(cue1,cue1:diagnosis2) 12204 6197
## cor(diagnosis1,cue1:diagnosis2) 9337 6761
## cor(diagnosis2,cue1:diagnosis2) 7051 6784
## cor(cue1:diagnosis1,cue1:diagnosis2) 5397 6626
##
## ~subID (Number of levels: 71)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.21 0.02 0.18 0.25 1.00 1092 2193
## sd(cue1) 0.02 0.00 0.01 0.02 1.00 3944 4716
## cor(Intercept,cue1) -0.10 0.16 -0.40 0.21 1.00 7694 6615
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 5.43 0.04 5.35 5.51 1.00 790 1869
## diagnosis1 0.03 0.03 -0.02 0.08 1.00 831 1420
## diagnosis2 0.05 0.03 0.00 0.10 1.00 872 1782
## cue1 -0.02 0.01 -0.03 -0.01 1.00 2484 3486
## diagnosis1:cue1 -0.01 0.00 -0.02 -0.00 1.00 6657 6669
## diagnosis2:cue1 0.01 0.00 0.00 0.02 1.00 7003 6474
##
## Further Distributional Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.13 0.00 0.12 0.14 1.00 5187 5264
## ndt 172.73 6.81 158.74 185.58 1.00 5295 4845
##
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# get the estimates and compute groups
df.m.fab = as_draws_df(m.fab) %>%
select(starts_with("b_")) %>%
mutate(
b_COMP = - b_diagnosis1 - b_diagnosis2,
ASD = b_Intercept + b_diagnosis2,
ADHD = b_Intercept + b_diagnosis1,
COMP = b_Intercept + b_COMP
)
# plot the posterior distributions
df.m.fab %>%
select(starts_with("b_")) %>%
pivot_longer(cols = starts_with("b_"), names_to = "coef", values_to = "estimate") %>%
filter(coef != "b_Intercept") %>%
mutate(
coef = case_match(coef,
"b_diagnosis1" ~ "ADHD",
"b_diagnosis2" ~ "ASD",
"b_COMP" ~ "COMP",
"b_cue1" ~ "Face",
"b_diagnosis1:cue1" ~ "Interaction: ADHD",
"b_diagnosis2:cue1" ~ "Interaction: ASD"
),
coef = fct_reorder(coef, desc(coef))
) %>%
group_by(coef) %>%
mutate(
cred = case_when(
(mean(estimate) < 0 & quantile(estimate, probs = 0.975) < 0) |
(mean(estimate) > 0 & quantile(estimate, probs = 0.025) > 0) ~ "credible",
T ~ "not credible"
)
) %>% ungroup() %>%
ggplot(aes(x = estimate, y = coef, fill = cred)) +
geom_vline(xintercept = 0, linetype = 'dashed') +
ggdist::stat_halfeye(alpha = 0.7) + ylab(NULL) + theme_bw() +
scale_fill_manual(values = c(credible = c_dark, c_light)) + theme(legend.position = "none")
# H1a: FAB effect in COMP
h1a = hypothesis(m.fab, "0 < (diagnosis1:cue1 + diagnosis2:cue1 - cue1)")
h1a
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (0)-((diagnosis1:... < 0 -0.02 0.01 -0.03 0 81.47
## Post.Prob Star
## 1 0.99 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
# H1b: ADHD slower than COMP
h1b = hypothesis(m.fab, "0 < 2*diagnosis1 + diagnosis2")
h1b
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (0)-(2*diagnosis1... < 0 -0.11 0.05 -0.19 -0.03 65.12
## Post.Prob Star
## 1 0.98 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
# H1c: ASD slower than COMP
h1c = hypothesis(m.fab, "0 < 2*diagnosis2 + diagnosis1")
h1c
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (0)-(2*diagnosis2... < 0 -0.13 0.05 -0.21 -0.05 194.12
## Post.Prob Star
## 1 0.99 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
# H1d: FAB in ASD decreased
h1d = hypothesis(m.fab, "0 < diagnosis2:cue1")
h1d
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (0)-(diagnosis2:c... < 0 -0.01 0 -0.02 0 265.67
## Post.Prob Star
## 1 1 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
# ... compared to COMP
h1d_comp = hypothesis(m.fab, "0 < 2*diagnosis2:cue1 + diagnosis1:cue1")
h1d_comp
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (0)-(2*diagnosis2... < 0 -0.01 0.01 -0.02 0 7.14
## Post.Prob Star
## 1 0.88
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
# H1e: FAB in ADHD differs from FAB generally (undirected)
h1e = hypothesis(m.fab, "0 > diagnosis1:cue1", alpha = 0.025)
h1e
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (0)-(diagnosis1:c... > 0 0.01 0 0 0.02 1141.86
## Post.Prob Star
## 1 1 *
## ---
## 'CI': 95%-CI for one-sided and 97.5%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 97.5%;
## for two-sided hypotheses, the value tested against lies outside the 97.5%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
# ... compared to COMP
h1e_comp = hypothesis(m.fab, "0 > 2*diagnosis1:cue1 + diagnosis2:cue1", alpha = 0.025)
h1e_comp
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (0)-(2*diagnosis1... > 0 0.01 0.01 0 0.03 73.07
## Post.Prob Star
## 1 0.99 *
## ---
## 'CI': 95%-CI for one-sided and 97.5%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 97.5%;
## for two-sided hypotheses, the value tested against lies outside the 97.5%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
# Exploration
# E1: FAB generally
e1 = hypothesis(m.fab, "cue1 < 0", alpha = 0.025)
e1
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (cue1) < 0 -0.02 0.01 -0.03 -0.01 532.33 1 *
## ---
## 'CI': 95%-CI for one-sided and 97.5%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 97.5%;
## for two-sided hypotheses, the value tested against lies outside the 97.5%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
# E2: FAB effect in ADHD
e2 = hypothesis(m.fab, "0 < -cue1 - diagnosis1:cue1", alpha = 0.025)
e2
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (0)-(-cue1-diagno... < 0 -0.03 0.01 -0.04 -0.02 Inf
## Post.Prob Star
## 1 1 *
## ---
## 'CI': 95%-CI for one-sided and 97.5%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 97.5%;
## for two-sided hypotheses, the value tested against lies outside the 97.5%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
# E3: FAB effect in ASD
e3 = hypothesis(m.fab, "0 < -cue1 - diagnosis2:cue1", alpha = 0.025)
e3
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (0)-(-cue1-diagno... < 0 -0.01 0.01 -0.02 0 8.71
## Post.Prob Star
## 1 0.9
## ---
## 'CI': 95%-CI for one-sided and 97.5%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 97.5%;
## for two-sided hypotheses, the value tested against lies outside the 97.5%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
# E4: ADHD differ from ASD in their reaction time
e4 = hypothesis(m.fab, "diagnosis1 < diagnosis2", alpha = 0.025)
e4
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (diagnosis1)-(dia... < 0 -0.02 0.04 -0.1 0.06 2.4
## Post.Prob Star
## 1 0.71
## ---
## 'CI': 95%-CI for one-sided and 97.5%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 97.5%;
## for two-sided hypotheses, the value tested against lies outside the 97.5%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
# E5: FAB in ADHD differs from FAB in ASD
e5 = hypothesis(m.fab, "diagnosis1:cue1 < diagnosis2:cue1", alpha = 0.025)
e5
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (diagnosis1:cue1)... < 0 -0.02 0.01 -0.03 -0.01 7999
## Post.Prob Star
## 1 1 *
## ---
## 'CI': 95%-CI for one-sided and 97.5%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 97.5%;
## for two-sided hypotheses, the value tested against lies outside the 97.5%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
# extract predicted differences in ms instead of log data
df.new = df.fab %>%
select(diagnosis, cue) %>%
distinct() %>%
mutate(
condition = paste(diagnosis, cue, sep = "_")
)
df.ms = as.data.frame(
fitted(m.fab, summary = F,
newdata = df.new %>% select(diagnosis, cue),
re_formula = NA))
colnames(df.ms) = df.new$condition
# calculate our difference columns
df.ms = df.ms %>%
mutate(
COMP = (COMP_face + COMP_object)/2,
ADHD = (ADHD_face + ADHD_object)/2,
ASD = (ASD_object + ASD_face)/2,
FAB = (ADHD_object + ASD_object + COMP_object - ADHD_face - ASD_face - COMP_face)/3,
h1a = COMP_object - COMP_face,
h1b = COMP - ADHD,
h1c = COMP - ASD,
h1d_comp = (ASD_object - ASD_face) - (h1a),
h1d = (ASD_object - ASD_face) - FAB,
h1e = (ADHD_object - ADHD_face) - (h1a),
e4 = ADHD - ASD,
e5 = (ADHD_object - ADHD_face) - (ASD_object - ASD_face)
)
Our Bayesian linear mixed model with the median of correct reaction times as the outcome and diagnostic status, cue (face or object) and their interaction confirmed a face attention bias in our comparison group: COMP participants reacted faster in response to targets appearing on the side of the face compared to targets appearing on the side of the object (CI of COMP(object) - COMP(face): 0.95 to 13ms, posterior probability = 98.79%). Both clinical groups exhibited increased overall reaction times compared with the COMP group (CI of COMP - ADHD: -46.47 to -2.43ms, posterior probability = 98.49%; CI of COMP - ASD: -51.38 to -7.49ms, posterior probability = 99.49%). Furthermore, while face attention bias was not credibly decreased in ASD participants compared to COMP participants (CI of ASD(FAB) - COMP(FAB): -8.29 to 3.33ms, posterior probability = 87.71%), it is credibly decreased compared to the average face attention bias in our sample (CI of ASD(FAB) - average FAB: -7.62 to -0.82ms, posterior probability = 99.62%). This effect is also reflected in the face attention bias in the ADHD group which differs credibly from face attention bias in COMP participants (CI of ADHD(FAB) - COMP(FAB): 1.8 to 13.48ms, posterior probability = 98.65%).
Furthermore, our exploration of the data revealed a credible face attention bias across all groups (CI of object - face: 3.09 to 14.27ms, posterior probability = 99.81%). Last, we explored differences between our clinical groups. While there were no credible differences in overall reaction times between adults with ASD and adults with ADHD in our data (CI of ADHD - ASD: -24.61 to 14.2ms, posterior probability = 70.62%), there was a credible difference between face attention bias in the clinical groups (CI of ASD(FAB) - ADHD(FAB): 4.26 to 16.24ms, posterior probability = 99.99%).
# overall median reaction times
df.fab %>%
group_by(subID, diagnosis, cue) %>%
summarise(
rt.cor = mean(rt.cor, na.rm = T)
) %>%
ggplot(aes(diagnosis, rt.cor, fill = cue, colour = cue)) + #
geom_rain(rain.side = 'r',
boxplot.args = list(color = "black", outlier.shape = NA, show_guide = FALSE, alpha = 0.5),
violin.args = list(color = "black", outlier.shape = NA, alpha = 0.5),
boxplot.args.pos = list(
position = ggpp::position_dodgenudge(x = 0, width = 0.3), width = 0.3
),
point.args = list(show_guide = FALSE, alpha = .5),
violin.args.pos = list(
width = 0.6, position = position_nudge(x = 0.16)),
point.args.pos = list(position = ggpp::position_dodgenudge(x = -0.25, width = 0.1))) +
ylim(0, 620) +
scale_fill_manual(values = custom.col) +
scale_color_manual(values = custom.col) +
labs(title = "Median reaction times per subject", x = "", y = "reaction time (ms)") +
theme_bw() +
theme(legend.position = "bottom", plot.title = element_text(hjust = 0.5), legend.direction = "horizontal", text = element_text(size = 15))
# focus on the difference in reaction times
df.fab %>%
group_by(subID, diagnosis, cue) %>%
summarise(
rt.cor = mean(rt.cor, na.rm = T)
) %>%
group_by(subID, diagnosis) %>%
arrange(subID, diagnosis, cue) %>%
summarise(FAB = diff(rt.cor[1:2])) %>%
ggplot(aes(diagnosis, FAB, fill = diagnosis, colour = diagnosis)) + #
geom_rain(rain.side = 'r',
boxplot.args = list(color = "black", outlier.shape = NA, show_guide = FALSE, alpha = 0.5),
violin.args = list(color = "black", outlier.shape = NA, alpha = 0.5),
boxplot.args.pos = list(
position = ggpp::position_dodgenudge(x = 0, width = 0.3), width = 0.3
),
point.args = list(show_guide = FALSE, alpha = .5),
violin.args.pos = list(
width = 0.6, position = position_nudge(x = 0.16)),
point.args.pos = list(position = ggpp::position_dodgenudge(x = -0.25, width = 0.1))) +
geom_hline(yintercept = 0) +
scale_fill_manual(values = custom.col) +
scale_color_manual(values = custom.col) +
labs(title = "Face attention bias (median RT object - face) per subject", x = "", y = "fab (ms)") +
theme_bw() +
theme(legend.position = "bottom", plot.title = element_text(hjust = 0.5), legend.direction = "horizontal", text = element_text(size = 15))
To complement our hypothesis testing using brms::hypothesis(), we perform a Bayes Factor (BF) analysis. The BF is the ratio of the marginal likelihoods of the data given two models. We will compare models containing different combinations of population-level effects to the model only containing the intercept on the population-level and all group-level effects. The BF depends on the priors that were used, because it indicates a change in our belief after seeing the data. Therefore, we perform a sensitivity analysis comparing the BF based on our chosen priors with narrower and wider priors.
# set the directory in which to save results
sense_dir = file.path(getwd(), "_brms_sens_cache")
main.code = "fab"
# describe priors
pr.descriptions = c("chosen",
"sdx1.25", "sdx1.5", "sdx2", "sdx4", "sdx6", "sdx8", "sdx10",
"sdx0.875", "sdx0.75", "sdx0.5", "sdx0.25", "sdx0.167", "sdx0.125", "sdx0.1"
)
# check which have been run already
if (file.exists(file.path(sense_dir, sprintf("df_%s_bf.csv", main.code)))) {
pr.done = read_csv(file.path(sense_dir, sprintf("df_%s_bf.csv", main.code)), show_col_types = F) %>%
select(priors) %>% distinct()
pr.descriptions = pr.descriptions[!(pr.descriptions %in% pr.done$priors)]
}
if (length(pr.descriptions) > 0) {
# rerun the model with more iterations for bridgesampling
m.fab.bf = brm(f.fab,
df.fab, prior = priors,
family = shifted_lognormal,
iter = 40000, warmup = 10000,
backend = "cmdstanr", threads = threading(8),
file = "m_fab_bf", silent = 2,
save_pars = save_pars(all = TRUE)
)
}
# loop through them
for (pr.desc in pr.descriptions) {
tryCatch({
# use function to compute BF with the described priors
bf_sens_2int(m.fab.bf, "diagnosis", "cue", pr.desc,
main.code, # prefix for all models and MLL
file.path(sense_dir, "log_FAB.txt"), # log file
sense_dir # where to save the models and MLL
)
},
error = function(err) {
message(sprintf("Error for %s: %s", pr.desc, err))
}
)
}
# read in the results
df.fab.bf = read_csv(file.path(sense_dir, sprintf("df_%s_bf.csv", main.code)), show_col_types = F)
# check the sensitivity analysis result per model
df.fab.bf %>%
filter(`population-level` != "1") %>%
mutate(
sd = as.factor(case_when(
priors == "chosen" ~ "1",
substr(priors, 1, 3) == "sdx" ~ gsub("sdx", "", priors),
T ~ priors)
),
order = case_when(
priors == "chosen" ~ 1,
substr(priors, 1, 3) == "sdx" ~ as.numeric(gsub("sdx", "", priors)),
T ~ 999),
sd = fct_reorder(sd, order)
) %>%
ggplot(aes(y = bf.log, x = sd, group = `population-level`, colour = `population-level`)) +
geom_point() +
geom_line() +
geom_vline(xintercept = "1") +
geom_hline(yintercept = 0) +
ggtitle("Sensitivity analysis with the intercept-only model as reference") +
scale_colour_manual(values = custom.col) +
theme_bw() +
theme(legend.position = "bottom", axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
Quite a few models outperform the intercept model unless very wide priors are used. Therefore, we plot the models containing predictors with the model containing both predictors but no interaction as the reference model.
# compare to main effects model as reference
df.fab.bf %>%
filter(`population-level` != "1") %>%
group_by(priors) %>%
mutate(bf.log = bf.log - bf.log[`population-level` == "diagnosis + cue"]) %>%
ungroup() %>%
mutate(
sd = as.factor(case_when(
priors == "chosen" ~ "1",
substr(priors, 1, 3) == "sdx" ~ gsub("sdx", "", priors),
T ~ priors)
),
order = case_when(
priors == "chosen" ~ 1,
substr(priors, 1, 3) == "sdx" ~ as.numeric(gsub("sdx", "", priors)),
T ~ 999),
sd = fct_reorder(sd, order)
) %>%
ggplot(aes(y = bf.log, x = sd, group = `population-level`, colour = `population-level`)) +
geom_point() +
geom_line() +
geom_vline(xintercept = "1") +
theme_bw() +
ggtitle("Sensitivity analysis with the 'diagnosis + cue' model as reference") +
scale_colour_manual(values = custom.col) +
theme(legend.position = c(0.2, 0.25), axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
Here, we can see that the model containing both predictors and their interaction consistently outperforms the other models for narrow and medium priors. For wider priors, the results are less consistent.
# create a data frame with the comparisons
kable(df.fab.bf %>% filter(priors == "chosen") %>% select(-priors) %>%
filter(`population-level` != "1") %>% arrange(desc(bf.log)), digits = 3)
| population-level | bf.log |
|---|---|
| diagnosis * cue | 5.686 |
| diagnosis + cue | 4.768 |
| diagnosis | 2.471 |
| cue | 2.283 |
The comparison of the models reveals the model containing both main effects and their interaction on the population-level to be the best model as measured by the BF when using our chosen priors.
Specifically, there is anecdotal evidence in favour of this model underlying the data observed compared to the model not including the interaction (log(BF) = 0.918), very strong evidence in favour of this model compared to the model including only the predictor cue (log(BF) = 3.403), strong evidence in favour of this model compared to the one only including the predictor diagnosis (log(BF) = 3.215) as well as extreme evidence in favour of this model compared to the model only including the intercept on the population-level (log(BF) = 5.686).
Last but not least, we are going to explore possible differences with regards to mean accuracies using a Bayesian ANOVA
# load the original, unaggregated data
load("FAB_data.RData")
# aggregate mean accuracy per condition and person
df.acc = df.fab %>%
group_by(subID, diagnosis, cue) %>%
summarise(acc = mean(acc)*100)
# check normal distribution
df.acc %>% group_by(diagnosis, cue) %>%
shapiro_test(acc) %>%
mutate(
sig = if_else(p < 0.05, "*", "")
)
## # A tibble: 6 × 6
## diagnosis cue variable statistic p sig
## <fct> <fct> <chr> <dbl> <dbl> <chr>
## 1 ADHD face acc 0.504 0.0000000940 *
## 2 ADHD object acc 0.621 0.00000154 *
## 3 ASD face acc 0.901 0.0230 *
## 4 ASD object acc 0.791 0.000213 *
## 5 COMP face acc 0.814 0.000500 *
## 6 COMP object acc 0.787 0.000180 *
# rank transform the data
df.acc = df.acc %>%
ungroup() %>%
mutate(
racc = rank(acc)
)
# run the ANOVA
aov.acc = anovaBF(racc ~ diagnosis*cue, data = df.acc)
extractBF(aov.acc, logbf = T)
## bf error time
## diagnosis -1.021576 0.0002886018 Tue Sep 17 10:47:42 2024
## cue -1.708827 0.0004360949 Tue Sep 17 10:47:42 2024
## diagnosis + cue -2.730620 0.0196305266 Tue Sep 17 10:47:42 2024
## diagnosis + cue + diagnosis:cue -4.648269 0.0168901727 Tue Sep 17 10:47:42 2024
## code
## diagnosis 89e14beced84
## cue 89e142e660f2
## diagnosis + cue 89e16cb52a92
## diagnosis + cue + diagnosis:cue 89e11c00d309
# print overall accuracy rates
df.acc %>%
group_by(diagnosis, cue) %>%
summarise(mean_accuracy = mean(acc, na.rm = T), sd_accuracy = sd(acc, na.rm = T))
## # A tibble: 6 × 4
## # Groups: diagnosis [3]
## diagnosis cue mean_accuracy sd_accuracy
## <fct> <fct> <dbl> <dbl>
## 1 ADHD face 96.6 4.64
## 2 ADHD object 96.4 4.29
## 3 ASD face 96.9 2.73
## 4 ASD object 97.6 2.02
## 5 COMP face 97.8 2.00
## 6 COMP object 97.7 2.11
df.acc %>%
group_by(diagnosis) %>%
summarise(mean_accuracy = mean(acc, na.rm = T), sd_accuracy = sd(acc, na.rm = T))
## # A tibble: 3 × 3
## diagnosis mean_accuracy sd_accuracy
## <fct> <dbl> <dbl>
## 1 ADHD 96.5 4.42
## 2 ASD 97.3 2.40
## 3 COMP 97.8 2.04
Accuracies were generally high, with a grand average of 97.21% accurate responses across diagnostic groups. None of the models outperformed the intercept-only model, therefore, we conclude that there were no differences between diagnostic groups, cues or their interaction with regards to accuracies.
# overall accuracies
df.acc %>%
ggplot(aes(diagnosis, acc, fill = cue, colour = cue)) + #
geom_rain(rain.side = 'r',
boxplot.args = list(color = "black", outlier.shape = NA, show_guide = FALSE, alpha = 0.5),
violin.args = list(color = "black", outlier.shape = NA, alpha = 0.5),
boxplot.args.pos = list(
position = ggpp::position_dodgenudge(x = 0, width = 0.3), width = 0.3
),
point.args = list(show_guide = FALSE, alpha = .5),
violin.args.pos = list(
width = 0.6, position = position_nudge(x = 0.16)),
point.args.pos = list(position = ggpp::position_dodgenudge(x = -0.25, width = 0.1))) +
ylim(0, 100) +
scale_fill_manual(values = custom.col) +
scale_color_manual(values = custom.col) +
labs(title = "Mean accuracies per subject", x = "", y = "percent") +
theme_bw() +
theme(legend.position = "bottom", plot.title = element_text(hjust = 0.5), legend.direction = "horizontal", text = element_text(size = 15))